Despite excellent performance in image generation, Generative Adversarial Networks (GANs) are notorious for its requirements of enormous storage and intensive computation. As an awesome ''performance maker'', knowledge distillation is demonstrated to be particularly efficacious in exploring low-priced GANs. In this paper, we investigate the irreplaceability of teacher discriminator and present an inventive discriminator-cooperated distillation, abbreviated as DCD, towards refining better feature maps from the generator. In contrast to conventional pixel-to-pixel match methods in feature map distillation, our DCD utilizes teacher discriminator as a transformation to drive intermediate results of the student generator to be perceptually close to corresponding outputs of the teacher generator. Furthermore, in order to mitigate mode collapse in GAN compression, we construct a collaborative adversarial training paradigm where the teacher discriminator is from scratch established to co-train with student generator in company with our DCD. Our DCD shows superior results compared with existing GAN compression methods. For instance, after reducing over 40x MACs and 80x parameters of CycleGAN, we well decrease FID metric from 61.53 to 48.24 while the current SoTA method merely has 51.92. This work's source code has been made accessible at https://github.com/poopit/DCD-official.
translated by 谷歌翻译
This paper proposes a content relationship distillation (CRD) to tackle the over-parameterized generative adversarial networks (GANs) for the serviceability in cutting-edge devices. In contrast to traditional instance-level distillation, we design a novel GAN compression oriented knowledge by slicing the contents of teacher outputs into multiple fine-grained granularities, such as row/column strips (global information) and image patches (local information), modeling the relationships among them, such as pairwise distance and triplet-wise angle, and encouraging the student to capture these relationships within its output contents. Built upon our proposed content-level distillation, we also deploy an online teacher discriminator, which keeps updating when co-trained with the teacher generator and keeps freezing when co-trained with the student generator for better adversarial training. We perform extensive experiments on three benchmark datasets, the results of which show that our CRD reaches the most complexity reduction on GANs while obtaining the best performance in comparison with existing methods. For example, we reduce MACs of CycleGAN by around 40x and parameters by over 80x, meanwhile, 46.61 FIDs are obtained compared with these of 51.92 for the current state-of-the-art. Code of this project is available at https://github.com/TheKernelZ/CRD.
translated by 谷歌翻译
Low Earth Orbit (LEO) constellations, each comprising a large number of satellites, have become a new source of big data "from the sky". Downloading such data to a ground station (GS) for big data analytics demands very high bandwidth and involves large propagation delays. Federated Learning (FL) offers a promising solution because it allows data to stay in-situ (never leaving satellites) and it only needs to transmit machine learning model parameters (trained on the satellites' data). However, the conventional, synchronous FL process can take several days to train a single FL model in the context of satellite communication (Satcom), due to a bottleneck caused by straggler satellites. In this paper, we propose an asynchronous FL framework for LEO constellations called AsyncFLEO to improve FL efficiency in Satcom. Not only does AsynFLEO address the bottleneck (idle waiting) in synchronous FL, but it also solves the issue of model staleness caused by straggler satellites. AsyncFLEO utilizes high-altitude platforms (HAPs) positioned "in the sky" as parameter servers, and consists of three technical components: (1) a ring-of-stars communication topology, (2) a model propagation algorithm, and (3) a model aggregation algorithm with satellite grouping and staleness discounting. Our extensive evaluation with both IID and non-IID data shows that AsyncFLEO outperforms the state of the art by a large margin, cutting down convergence delay by 22 times and increasing accuracy by 40%.
translated by 谷歌翻译
实例级图像检索(IIR)或简单的实例检索,涉及在数据集中查找包含查询实例(例如对象)的数据集中所有图像的问题。本文首次尝试使用基于实例歧视的对比学习(CL)解决此问题。尽管CL在许多计算机视觉任务中表现出令人印象深刻的性能,但在IIR领域也从未找到过类似的成功。在这项工作中,我们通过探索从预先训练和微调的CL模型中得出判别表示的能力来解决此问题。首先,我们通过比较预先训练的深度神经网络(DNN)分类器与CL模型学到的功能相比,研究了IIR转移学习的功效。这些发现启发了我们提出了一种新的培训策略,该策略通过使用平均精度(AP)损失以及微调方法来学习针对IIR量身定制的对比功能表示形式,从而优化CL以学习为导向IIR的功能。我们的经验评估表明,从挑战性的牛津和巴黎数据集中的预先培训的DNN分类器中学到的现成的特征上的表现显着提高。
translated by 谷歌翻译
在本文中,我们提出了Tetris,这是一个面向目标脚本完成的新任务。与以前的工作不同,它考虑了一个更现实,更通用的设置,其中输入不仅包括目标,还包括其他用户上下文,包括偏好和历史记录。为了使用基于知识的方法解决问题,我们介绍了任务概念图,这是一种自动从教学网站构建的知识库。不同于常识知识基础(例如ConceptNet),任务概念图架构架构介绍了专门用于完成任务的各种基于名词短语的节点。为了将这些图形集成到脚本学习中,我们设计了两种从知识库中获取概念的方法,以作为下游脚本完成的提示。在我们的基于Wikihow的数据集中,我们发现从任务概念图中合并概念会始终提高性能,并证明任务概念图的好处。此外,具有金色标准概念的模型迅速胜过基线,进一步证实了在目标脚本完成中对特定于任务知识的需求。数据集,存储库,模型和演示将公开使用,以促进对这项新任务的进一步研究。
translated by 谷歌翻译
卷积神经网络(CNN)在许多计算机视觉任务(例如图像分类和对象检测)中取得了巨大的成功。但是,他们的性能在更艰巨的任务上迅速降低,因为图像是低分辨率或物体很小的。在本文中,我们指出,这根源于现有CNN体系结构中的有缺陷但常见的设计,即使用稳固的卷积和/或汇总层,这导致丢失细粒度的信息和学习较低有效的功能表示形式。为此,我们提出了一个新的CNN构建块,称为SPD-CONV,代替每个稳定的卷积层和每个池层(从而完全消除它们)。 SPD-CONV由一个对深度(SPD)层的组成,然后是非构造卷积(CORV)层,并且可以在大多数(如果不是全部)CNN体系结构中应用。我们在两个最具代表性的计算机视觉任务下解释了这种新设计:对象检测和图像分类。然后,我们通过将SPD-CONV应用于Yolov5和Resnet来创建新的CNN体​​系结构,并从经验上表明,我们的方法显着优于最先进的深度学习模型,尤其是在具有低分辨率图像和小物体的更艰巨的任务上。我们已经在https://github.com/labsaint/spd-conv上开源代码。
translated by 谷歌翻译
动作识别是人工智能的激动人心的研究途径,因为它可能是新兴工业领域(例如机器人视觉和汽车)的游戏规则。但是,由于巨大的计算成本和效率低下的学习,当前的深度学习面临着此类应用的主要挑战。因此,我们开发了一种新型的基于脑启发的尖峰神经网络(SNN)的系统,标题为用于在线动作学习的尖峰门控流(SGF)。开发的系统由多个以分层方式组装的SGF单元组成。单个SGF单元涉及三层:特征提取层,事件驱动的层和基于直方图的训练层。为了展示开发的系统功能,我们采用标准的动态视觉传感器(DVS)手势分类作为基准。结果表明,我们可以达到87.5%的精度,这与深度学习(DL)相当,但在较小的培训/推理数据编号比率为1.5:1。在学习过程中,只需要一个单个培训时代。同时,据我们所知,这是基于非回复算法的SNN中最高准确性。最后,我们结论了开发网络的几乎没有的学习范式:1)基于层次结构的网络设计涉及人类的先验知识; 2)用于基于内容的全局动态特征检测的SNN。
translated by 谷歌翻译
从相机中检测3D车道是自动车辆的一个上升问题。在此任务中,正确的相机姿势是生成准确通道的关键,可以将图像从透视图转换为顶视图。通过这种转变,我们可以摆脱透视效果,使得3D车道看起来相似,可以精确地装配低阶多项式。然而,主流3D车道探测器依赖于其他传感器提供的完美相机姿势,这是昂贵的并且遇到多传感器校准问题。为了克服这个问题,我们建议通过用双级框架估计来自单个图像的摄像机姿势来预测3D车道。第一阶段针对从透视图图像的相机姿势任务。为了提高姿势估计,我们介绍了辅助3D车道任务和几何约束,从多任务学习中受益,这增强了3D和2D之间的常规,以及在上述两个任务中的兼容性。第二阶段针对3D Lane任务。它使用先前估计的姿势来生成包含距离不变通道外观的顶视图,以预测准确的3D车道。实验表明,如果没有地面真相相机姿势,我们的方法优于最先进的完美相机姿势的方法,并且具有最少的参数和计算。代码在https://github.com/liuruijin17/clgo提供。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译